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Two novel T cell epitope prediction 
algorithms based on MHC-binding 
motifs; comparison of predicted and 
published epitopes from Mycobacterium 
tuberculosis and HIV protein sequences 

Gabriel E. Meister", Caroline G.P. Roberts', Jay A. Berzofsky r and 
Anne S. De Groof 

We have designed two computer-based algoritlxms for T cell epitope prediction, OptiMer 
and EpiAfer, which incorporate current knowledge of MHC-binding motifs, OptiMer 
locates amphipathic segments of protein antigens with a high density of MHC-binding 
motifs. EpiMer identifies peptides with a high density of MHC-hinding motifs alone. 
These algorithms exploit the striking tendency for MHC-binding motifs to cluster within 
short segments of each protein. Putative epitopes predicted by these algorithms contain 
motifs corresponding to many different MHC alleles, and may contain both class I and 
class II motifs, features thought to he ideal for the peptide components of synthetic 
subunit vaccines In this study, we describe the use of OptiMer and EpiMer for the 
prediction of putative T cell epitopes from Mycobacterium tuberculosis and human 
immunodeficiency virus protein antigens, and demonstrate that these two algorithms may 
provide sensitive and efficient means for the prediction of promiscuous T cell epitopes that 
may be critical to the development of vaccines against these and other pathogens. 

K«*waV- MHC-bmdinfK T-cell tpiiopcs: vaccine 



The cellular immune response to pathogens depends 
upon the presentation and recognition of their protein 
antigen* in the form of intracelkilarly processed 
peptides, bound to class I or class II major 
histocompatibility complex (MHC) molecules and 
expressed ai the cell surface. Pepiides presented in 
conjunction with MHC class I molecules are derived 
from antigens synthesized in the cytoplasm, and are 
generally from 8 to 10 amino acids in length 1 ''. Peptides 
derived from exogenous antigens are usually presented 
in the content of MHC class U molecules, and range in 
length from around 10 to over 20 amino acids - "'. 

The identification of those peptides that stimulate T 
cell responses, termed T cell epitopes, i> essential to the 
development of successful vaccines. Several methods 
have been employed to locate T cell epitopes within the 
amino acid sequences of viral and bacterial protein 
antigens' u a ". One common approach has been to 
synthesize overlapping peptides which span the entire 
sequence of a protein antigen. These overlapping 
peptides are then tested for their capacity to stimulate T 
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cell proliferative or cytotoxic responses m vitro* ltt . 
While the overlapping peptide method is thorough, it is 
both cost- and labor-intensive; for a given protein of 
length n amino acids, (n/l0)-l peptides that are 20 
amino acids long (20-mer) and overlap by 10 amino 
acids would need to be synthesized to employ this 
method of epitope identification. 

Several computer-based algorithms have been 
designed to predict T cell epitopes from the amino acid 
sequences of proteins. Notably, the AMPHi algorithm 
searches a protein's primary structure for peptides with 
a high probability of folding as amphipathic 
structures" ,2 , In n previous analysis of the predictive 
power or the AMPHI algorithm, 70% of published 
epitopes were shown to contain sequences that would 
have been predicted by AMPHI" ''\ Even as ihc 
number of known T cell epitopes has quadrupled since 
the advent of the AMPHI algorithm, 65% are 
amphipathic. and the correlation remains highly 
significant 0 . A new structural basis for this empirical 
correlation, despite the lack of hehciiy in peptides as 
bound in the MHC groove, has recently been 
suggested u . This has been based on two observations: 
first, that the influenza peptide bound in the groove of 
DRI in the crystal structure of Stern el ai is a beta- 
strand with a 130° twist, giving it a hydrophobic 
periodicity similar to that of an amphipathic helix' 5 ; 
and second, th.U the spacing of hydrophobic anchor 
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residues in the majority of MHC binding motifs fits the 
periodicity sought by AMPH1 even though the peptides 
arc not helical. Other epitope prediction algorithms 
which analyze protein sequences for specific secondary 
structural or sequence characteristics 16 " generally 
search for a spacing of hydrophobic residues similar to 
that searched for by the AMPHJ algorithm. 

Another approach to T cell epitope identification has 
been to search protein sequences for regions that 
contain MHC-binding motifs' 0,2 ", amino acid motifs 
found in a large proportion of peptides that bind to 
specific MHC alleles. Such motifs have been derived 
using two methods. Primarily, large pools of "naturally 
processed" peptides, derived from erjdogenously 
processed proteins, are acid-el uled from membrane- 
bound MHC molecules and sequenced. These 
sequences, when aligned, have been shown to 
incorporate certain amino acids at specific positions; 
these "anchor" residues are thought to facilitate peptide 
binding within the MHC binding groove 1 . The restricted 
patterns of anchor residues, termed "motifs", differ for 
peptides eluted from different MHC alleles, suggesting 
ihat the interaction of the anchor amino acids with the 
surface of the MHC molecule determines the MHC 
specificity of immune response to peptide epitopes. 

MHC-binding motifs have also been deduced by 
alignment of published allcle-specific T cell epitope 
sequences"' ". The literature now contains motifs for a 
wide variety of human class I, class II. and murine 
MHC alleles' " Certain motifs have been 
demonstrated to accurately predict T cell epitopes from 
primary structures*' However, the predictive 
capacities of single MHC-binding motifs vary; in some 
cases, peptides which bind MHC molecules lack 
correlation with motifs, and not all peptides containing 
these binding motifs arc immunodominant 3 " 11 * , - M '. 

Our laboratory has developed two novel algorithms, 
OptiMer and EpiMer. designed to predict T cell 
epitopes from protein primary structures. OptiMer 
takes into account both amphipathicity and MHC- 
binding motifs, while EpiMer focuses on the location of 
MHC-binding motifs alone. 

Using published MHC-binding motifs, OptiMer 
examines the amino acid sequences of proteins and 
generates a list of peptides that contain these motifs; the 
algorithm then identifies peptides that would be 
amphipathic if folded as a helix or twisted its a beta- 
strand, using the AMPHI algorithm. These potentially 
amphipathic peptides arc compared to the list of MHC- 
binding moLif matches. OptiMer extends the predicted 
amphipathic peptides, to maximize the density of MHC- 
binding motif matches per length of protein region. 

The EpiMer algorithm searches protein amino acid 
sequences for MHC-bioding motif matches, generating 
a list of matches for each protein The algorithm then 
identifies clusters of MHC-binding motifs, predicting 
putative T cell epitopes based on ihe relative density of 
these motifs. A striking observation demonstrated here 
is the tendency of MHC binding motifs to cluster within 
protein sequences, creating regions more promising for 
use in synthetic subunit vaccines. 

These two novel algorithms. OptiMer and EpiMer, 
were used to predict putative epitopes in five 
Mycobacterium tuberculosis (Mib) protein antigens (14, 
16, 19. 38, and 65 kDa) and three human 
immunodeficiency virus (HIV) protein antigens [nef, 



gp 1 60. and reverse transcriptase (RT)]. To evaluate the 
new algorithms' predictive power, we have compared 
OptiMer- and EpiMcr-predicted epitopes, AMPHK 
predicted epitopes, and peptides that would have been 
synthesized using the "overlapping peptide" method, to 
a selection of T cell epitopes that have been published 
for each of these eight proteins. 

METHODS 

MHC class I- and class Il-specific binding motifs were 
collected from the literature (Table J). In total, J 5 
distinct class II and 19 distinct class 1 motifs were used 
by the OpliMcr and EpiMer algorithms. Protein 
primary sequences were searched for peptides that 
contained each MHC-binding motif using the DataMao 
v2.02 text processor (Andrew Thomas-Cramer, 
FeaiherSofl, Madison, WI), generating a complete list 
of motif matches for each protein. 

Eight protein sequences were obtained from the 
Protein Identification Resource (National Library of 
Medicine): A43589 (14 kDa Mtb antigen); A43823 (16 
kDa Mib antigen); S02753 (19 kDa Mtb antigen); 
P157J2 (38 kDa Mtb antigen), A26950 (65 kDa Mtb 
antigen); P04582 (HIV-J BH8 gp!60); P03406 (HIV-1 
BRU ncf), and P03367 (HIV-1 BRU RT). The HIV 
protein RT is coded for by amino acid residues 168 - 
728 of the pol gene. MHC class II-restricted T cell 
epitopes for the five Mtb proteins, and MHC class I- 
and class II-restncted epitopes for the three HIV 
proteins, were compiled from the literature" ~ 76 (as an 
example, see Figure J a). 

Published epitopes that had been identified with the 
aid of the AMPHI algorithm were excluded from the list 
of published epitopes compiled for this study, to avoid 
introducing bias in favor of either AMPHI or OptiMer, 
both of which are based on the identification of 
amphipathic structures. For the three HIV proteins 
analysed, only epitopes published for the strains of HIV 
noted above, and that had been identified in cither 
human or murine models, were included in our analysis. 
(As gp!60 derived from the HIV strain BH10 is 99% 
homologous to that of the BH8 strain, gpl60 T cell 
epitopes identified in the BHIO strain of HIV were also 
included, with appropriate modifications in amino acid 
numbering.) 

The OptiMer algorithm lengthens regions of proteins 
predicted by the AMPHI algorithm 11 in order to 
generate peptides with a maximal density of MHC- 
binding motif matches (Figure lb). Briefly, the 
algorithm compares each potentially amphipathic 
segment, of length n amino acids, to the list of MHC- 
binding motif matches for the same protein. The 
amphipathic regions are lengthened at both their N- and 
C-termini until a maximal density, d t of MHC-binding 
motif matches (d=mtn, where m = the number of 
included motif matches) is reached, generating an 
OptiMer-predicted peptide {pred). OptiMer predictions 
can be customized using lists of motif matches specific 
to MHC class I or class II alleles, or both; likewise, 
these lists can include or exclude non-human MHC- 
binding motifs. 

The OptiMer algorithm allows the user to define a 
level of motif density below which modified 
amphipathic peptides arc excluded from the final list of 
putative epitopes. As this requisite level of motif density 
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is increased, the number of OptiMer- predicted peptides 
which would potentially be synthesized, and thus the 
number of amino acid residues required to construct 
these peptides {n p „<) decreases. OpuMer-prcdicted 
peptides with a motif density d greater than d were 
chosen Tor the analysis presented here, as they showed 
the strongest correlation to T cell epitopes published in 
the literature (data nox shown), 

The EpiMer algorithm uses the custom list of MHC- 
binding motif matches generated for a given protein 
antigen to construct a motif density "map" or histogram 
{Figure 2). By stepping a reading frame of length r one 
amino acid at a time through the protein primary 
structure, the algorithm determines the motif density d 
for each peptide ot length r within the protein. Given a 
user-defined minimum density value t/ mtm itself a sum of 
the protein's mean MHC-binding motif density d and a 
positive or ntgative multiple of the density's standard 
deviation, EpiMer extracts only those motif-dense 
"clusters" with d > d mh ,. Finally, the algorithm uses a 
"threading value" t to link selected clusters into 
contiguous peptides, depending on their distance apart 
in the amino acid sequence. (As an example, r = 5 would 
assure that molif-rich clusters from 1 to 5 amino acids 



apart would be linked into the same predicted peptide, 
but that clusters 6 or more amino acids apart would not 
be thus linked. The technique of threading was 
implemented to avoid the generation of multiple 
peptides overlapping the same short region of a 
protein.) These clusters of MHC-binding motifs 
constitute the EpiMer algorithm's predictions for 
putative T cell epitopes {Figure ic). EpiMer searches, 
like OptiMer searches, can be tailored to include or 
exclude non-human motifs, and to search for one class 
or both classes of human MHC-btnding motifs. 

OptiMer- and EpiMcr-predicted peptides were 
compared to previously published epitopes for each of 
the eight protein antigens studied. A positive correlation 
was defined as an overlap of at least 1 1 amino acids 
between an OptiMer- or EpiMer-predicted peptide and 
a published T cell epitope, in the case of class II 
epitopes, and as an overlap of at least 8 amino acids, in 
the case of class I epitopes. These overlap values were 
chosen to allow the inclusion of a representative class II 
and class I MHC-bindiog motif sire, respectively 

For each protein, EpiMer- and OptiMer- predicted 
peptides were compared to peptides derived by 
simulating the overlapping peptide method most 



Tebtol Human class I - end class U- restricted MHC-bindlno, motifs used by the OpttMcr and EpiMer algorithms 
— — — ' ~— — Position (n peptide 



MHC-binding 
motif allele 



Reference / 



Hi 



k2 



»3 



h9 MO 



HLA-A1 
HLA-A2 1(a) 
HLA-A2.1(b) 

HLA-A3 

MLA-AH(a) 

HLA-AH(b) 

HLA-AH(c) 

HLA-ABB(a) 

HLA-A8B(b) 

HLA-BB(a) 

HLA-BB(b) 

hla-BB(c) 

HLA-BB(d) 

HLA-B27(a) 

HLAB27(b) 

HLA-B35 

HU-840 

KU-B53(8) 

HLA-BS3(b) 

H LA 003,1 

HLA-OR1{a) 

HLA-DR1(b) 
MLA-DRl(c) 

HLA-DR<2.5,7) 



23 

3 

24 

25 
26 
3 
3 

26,27 
26.27 
3 
3 

2d 

23 

3.30 

3.30 

31 

26 

31 

31 

32 

33 

34 
35 

33,36 



L 

L.I.V 



V.S.T 
V.S.T 



E.0 



R,K 
R.K 



I.F 



A.V.1,1 Y A.VJ.LY A.V.I.L.Y 
F.W.M.C F.W.M.C F.W.M.C 



L.l 



R.K 
R,K 

L.I 



HLA-DR2e/DR2b 37 

HLA-DR3/ORw52(a) 37 

HLA-DR3(b) 3B 

WLA-DR4(a) 37 

HLA-DR4(b) 34 

MLA-DR*w4 39 

HLA-DR7 37 

HLA-DR8 37 

HLA-ORw11(S) 34 

HLA-DR17 40 



no R,K.0.E,P no R.K.D.E A.G,S.T no D.E 
Y.F.W no 0,E no D.E M,L 

Y.F M.L 

A.V.i,L,Y 

F.W.M.C 

Y,F,W,I L.V 

I.L,V 
F,I.L,V,Y 
A.V.I.L.Y 
F.W.M.C 
F.L.V 
W.Y 
V.I.L.Y 
F,W,M 
F.I.L.V.Y 
F,I,L.V.Y 
W 



A.V.LJ. 
no D.E 



G.A 
G.A 

S.T.A.V.I 
L.P.C 
S.T.A.V.G 
I.UP.C 



Y 
V 



Y.K 
K 
K 
K 

R,K,H 

L 
L 
I.L 

R.K 
R.K 

Y 
L 



no D.E no D.E L.M.A.l.G 
T.V.Q.S 
L 

A.V,l f L.Y 
F.W.W.C 



R.K.H 



W,Y 



R.K.H 



D,N,QJ 
A.V,t,L.Y Q.N, R.K 
F.W.M.C D.E.S.T 

no R,K 



H.K.R 



M.L 

I.L.V 



R.K.H 
T 

T.S.V.L !.M no R.K 

N.S.T 

R 

D.t 



N.Q.S.T 
no R.K.D 

Or E 



Vaccine 1995 Volume 13 Number 6 583 
Supplied by The British Library - "The world's knowledge" — 



17. Nov. 2005 17:13 



Nov 17 2005 12:16 



J A KEMP & CO 



Jo. 9090 P. 16/23 



1 



Two novel Tcelf epitope prediction algorithms: G.E. Metster et al. 



SEQUENCE 



PUSLiSHEO 
EPfTOPCS 



OVERLAPPING 
PEPTIDES 




peptides 


M 


_2B 


39 




59 


7ft 


81 
















* 




11 


21 


37 




11 


75 


82 






1 


OPTIMER 
PEPTIDES 


14 


28 




X 




75 


92 

















MMCfcrnftij 
map 



t*A) 



GPlMER 


7 


75 


5 ^^_ 


74 




ptrriDEs 








73 


84 



Figurs 1 Schematic of methods used (or locatmg/predicting epitopes from a model protein sequence, the 19 kOa Mtb antigen (a) Shows the 
first 100 amino acids (AA) o» the 19 KDa sequence, along with published epitopes and peptides that would need to be synthesized to employ 
the overlapping peptide method, assuming 20»mer peptides overlapping by 10 AA. For this method. 9 peptides would be needed, totaling 160 
AA in length, (b) Illustrates the method for predicting OpliMer peptides. Amphlpathic peptides (in this example, three AMPHi peptides tor a total 
o' 52 aa) aje located, ihen extended to include a maximal density of MHC -binding mouf matches (three Rough OptiMer peptides, totaling SB 
AA); peptides wiih low motif densities are then systematically excluded from synthesis, generating OptiMer peptides (two peptides, totaling 33 
AA). (c) Illustrates the method for predicting EpiMer peptides. Clusters of motif density ere located within a protein sequence, those clusters 
with a high density ol binding motif matches are systematically joined, dependent upon (heir distance apart and selected for synthesis (giving, 
m this exampiB, three peptides, totaling 56 AA). Number ol binding motifs per 11 -residue segment (V axis) Is plotted against the midpoint of 
an 11 -residue reading frame (Xaxis), as described cn Methods 



commonly described in the literature*- For this 
analysis, *c have chosen to derive peptides of 20 
residues each, overlapping by 10 residues The AMPM1 
algorithm was also used to predict amphipathic peptides 
for our comparative analysis of prediction methods. 

Three measures of the predictive power of each 
epitope prediction method (the overlapping peptide 
method, and the AMPHI, OptiMer, and EpiMer 
algorithms) were used in these comparisons: efficiency, 
E, defined as 



E-|- 



rcsidues. Sensitivity, S, defined as 



I pub,,..., 



_ (number of published epitopes cprrclatinB w Uh predicted peptide^ 
(loul number of publiihcd epitopes for a given proicin) 

(2) 

was used to measure the ability of each method to 
predict a maximal number of published T cell epitopes. 
Finally, sensitivity per amino acid, SAA, defined as 



SAA; 



(3) 



(nvnKf «•! iiminu .twkb rmuirrt) 11' tv«Mrvc» «]l pT<<iO\nl prpiidkf) 



(!) 



was used io judge the capacity of cuch method to locate 
published epitopes using the fewest possible amino acid 



was used to scale sensitivity against the number of 
amino acids required to synthesize ail peptides chosen 
by a specific method, giving u rough measure of cost. 

For the comparisons, efficiency E, sensitivity S, and 
sensitivity per amino acid SAA measured for AMPHk 
OptiMer-, and EpiMer-predicted peptides, were scored 
against values calculuted for the overlapping peptide 
method, to measure the relative improvement of each 
algorithm over this method. 



i 
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The OptiMcr and EpiMer algorithms were executed 
Microsoft Excel v4.U (Microsoft Corporation, 
ocdmond, WA) using a Macintosh Quodra 650 (Apple 
rmnpuier, Inc., Cupertino, CA), and are currently 
tiog implemented in the C++ programming 
language 
pBSULTS 

prediction of putative Mtb T ceu epitopes 

0oth the OptiMer and EpiMer algorithms were used 
(0 prcdici putative T cell epitopes from within the 
sequences of five Mtb protein antigens (For an 
illustration of ihe methods used, sec Figures 1 and 2.) As 

14 KD* Mib ■nti9*« (tantfth = 134 amino acids) 



all pubbshed epitopes for these Mtb proteins were 
located through T cell proliferation assays, which 
measure a class H MHC-restriclcd response, only the 
list of class ll-rcstrictcd MHC-binding motifs was used 
by OptiMer and EpiMer to predict putative epitopes. 

Results for the five Mtb protein antigens studied arc 
shown in Table 2a. In all, 34 T cell epitopes matching 
the criteria defined in Methods have been published w 
the literature for the five Mtb proteins studied. OptiMer 
generated 41 putative epitopes for these five proteins, 
totaling 909 amino acids in length; the EpiMer 
algorithm generated 42 putative epitopes, totaling 756 
amino acids in length. These values are in comparison 
to 49 putative epitopes generated by toe AMPHJ 



MHC4Jnding 
moW maichee 



<AA) 



EplMer-pfedWrf 
puutko epitope* 



16 kOa MB mxtigmn (Unflth - 143 amino eddi) 



o i« 10 » 



PmIOm (Aa) 



£piMo*predleied 
puialive epitopes 



19 kOa Mb arrtlgon (length = 159 amino acWb) 



MKC-binding 
mai* matches 




> ro M W> it* no W 1M 1«0 
TmOkm (AA) 



CplMer-iwodcted 
| putative epitope* 


7 ^^ m J 




r* 








n 





A**. 2 MHC0.nd.ng modf den** W o, Hvym to; three JJ P^I" » 52l^tw^S^ 

acids apart from each other <o be combined into one dwunci predated putative eptope. Thus, the dusters centeiea at ammo ac»> 
ot the 19 ma aniiflen are tinned Into a EpiMer pepi.de wuendine from ammo acids 7 to 25 



Tab* 2 Correlation between AMPHI-. OpfMer- and EpiMer-predicied putative epitopes and published T cell eprtopes lor eight protein 
antigens ...... .... ..k, : .k^ . <«. ik„, Llih ornmrn antloem (14. 16. 19. 38. 6S kOa), using MHC 



class M versions of OptiMer and EpiMer 




Overlapping peptides 


aMPHI peptides 


OptiMer peptides 


EpiMer peptides 


"Number ol peptides made 

•Total number of amino acids required 

"Mean efficiency (%) 

*A Efficiency over overlapping method 

-Sensitivity (%) 

Wan sensitivny/AA (*10*) 

•A SenaitMty/AA over overlapping method 


132 

261B 

44 

100 (34/34) 
27 


49 

876 

54 

i.2-fo»d 
65 (22/34) 
50 

i.9-fold 


41 

909 

S4 

i. 2-fold 
53 (18/34) 
4.9 

1B-loid 


42 

756 

54 

1. Mold 
53 (18/34) 
5.6 

2. Mold 
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(b) Correlation between predicted putative epitopes end published epitopes Tor H'V protein arttigene nef and gpi 60. using MMC Class I versions 
of OptiMer and EpiMer 





Overfapping peplides 


AMPHl peptides 


OpuMer peptides 


EpiMer 
peptides 


Number or peptides made 


104 


36 


29 


30 


Total number ol amino acids required 


2077 


666 


661 


614 


Mean efficiency (%) 


35 


40 


41 


40 


A Efficiency over overlapping method 




Li -fold 


1.2-fOld 


1.1 -fold 


Sensitivity (%) 


100 (20720) 


60(16/20) 


75 (IS/20) 


SO (10/20) 


Mean senslliv.ty/AA (x10 a ) 


1.6 


29 


2.7 


31 


A Sensitivtty/AA over overlapping method 




1.9 fold 


1.7-fold 


2.0-fold 



(c) Correlation between predicted putative epitopes and published epitopes for HIV protein antigen AT. using MHC class l/li versions of OptiMer 
and EpiMer 





Overlapping peptides 


AMPHl peptide 


S OptiMer peptides 


EpiMer peptides 


Number of peptides made 


55 


23 


16 


22 


Total number ol amino actda required 


1101 


433 


422 


361 


Efficiency (%) 


24 


26 


26 


25 


A Efficiency over overlapping method 




1.1 -fold 


12-fold 


i.i -fold 


Sensitivity {%) 


100 (7/7) 


71 (577) 


71 (5/7) 


71 (5/7) 


Sensitwty/AA (o0°) 


0.9 


1.6 


1.7 


20 


A Seneitivity/AA over overlapping method 




1. 8-fold 


1.9-fold 


2.2-fold 



"Number of peptides potentially synthesized for each algorithm 

Total length, in amino acids, of peptides to be synthesized 

"Mean efficiency E (as explained in Methods) for all proteins m given group 

*6 Efficiency ~ (Efficiency of prediction method) / (Efficiency of overlapping peptide method for given group of proteins) 
"Total sensitivity, as explained m Methods, for all proteins in given group 

Wean sansitMiy.'AA (as explained in Methods) for all proiems in given group; values have been multiplied by 10* for the sake of clarity 
"A Sens ul vlty/AA = (Senaitivity/AA of given method) / (Sensltivity/AA o! overlapping peptide method for given group) 



algorithm (totaling 878 amino acid residues), and 132 
overlapping peptides (totaling over 2500 residues) 
needed 10 span each antigen using the overlapping 
peptide method. The OptiMer algorithm predicted 
published T cell epitopes for the five Mtb proteins with 
efficiency and sensitivity per amino acid comparable to 
that of the AMPHl algorithm, and both values exceeded 
those calculated for the overlapping peptide method. 
The EpiMer algorithm predicted published epitopes 
with an efficiency equal to that of cither OptiMer or 
AMPHl. and again exceeding that of the overlapping 
peptide method; EpiMefs sensitivity per amino acid 
was the highest of the algorithms tested. 

Prediction of putative HIV T cell epitopes 

Both OptiMer and EpiMer were then used to predict 
T cell epitopes from within the sequences of three HIV 
protein antigens. Epitopes published Pot the HIV 
protein antigens nef and gpl60 were almost exclusively 
class J MHC-resiricied, while epitopes published for RT 
were both class 1- and class II -restricted. Therefore, a 
version of cither OptiMer or EpiMer based on the list of 
class l-rcstricted MHC-bwding motifs was used to 
predict putative epitopes lor nef and gpl60, while 
versions of both algorithms based on the combined list 
of class I- and class IJ-iesiricted motifs were employed 
to predict putative epitopes for the HIV protein antigen 
RT. 

Results for the HIV protein antigens nef and gpl60 
arc shown in Tablr 2b Twenty T cell epitopes matching 
the criteria set forth in Methods have been described in 



the literature for these two antigens. In all, 29 putative 
epitopes were generated by the class 1-specific version of 
OptiMer (totaling 661 amino acids in length); 30 
putative epitopes were generated by EpiMer (totaling 
614 amino acids in length). AMPHl generated 36 
putative epitopes (totaling 666 amino acid residues), and 
104 peplides (totaling over 2000 residues in length) 
would have been required by the overlapping peptide 
method. For these two HIV protein antigens, the class I- 
restricted implementations of both OptiMer and 
EpiMer identified published epitopes with efficiency 
comparable to that or AMPHl, and greater than that of 
the overlapping peptide method. Again, the EpiMer 
algorithm's sensitivity per amino acid exceeded that of 
either the OptiMer algorithm or AMPHL 

Results for the HIV protein RT are shown in Tabic 
2 1 For RT, seven immunodominant T cell epitopes 
matching our criteria have been described in the 
literature. The combined class l/clnss II implementation 
of OpliMer generated 18 putative epitopes (totnling 422 
amino acids); ihe same implementation of EpiMer 
generated 22 putative epitopes (totaling 361 amino acids 
in length). These values are in comparison to 23 pulalive 
epitopes generated by the AMPHl algorithm (totaling 
433 ammo acids) and 55 peptides (totaling over 1000 
amino acid residues) required by the overlapping 
peptide method. OptiMer and EpiMer predicted 
published T cell epitopes for the HIV protein RT with 
both efficiency and sensitivity comparable to thai of the 
AMPHl algorithm. EpiMer again achieved the highest 
sensitivity per amino acid of the algorithms tested. 
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DISCUSSION 

Our laboratory has developed two algoriihms, OptiMer 
and EpiMer, which predict putative T cell epitopes from 
prote\n primary sequences, based on secondary 
structural characteristics and/or the density of MHC- 
binding motifs within the predicted epitopes. We have 
compared these algorithms to previously published 
methods of epitope identification, and bavc found that 
the algorithms arc, on the whole, able to predict T eel) 
epitopes from protein primary structure with 
considerable efficiency and sensitivity per amino acid, m 
comparison to the overlapping peptide method {Table 

Both OptiMer and EpiMer have been designed to 
predict peptides which contain clusters of MHC-binding 
motifs (Figure 3). Several strongly immunodominant T 
cell epitopes capable ot high-affinity binding to a 
number of different MHC molecules have been 
described'* 77 ; these epitopes are said to exhibit 
"promiscuous" or "degenerate" binding While the 
OptiMer or EpiMer algorithms do not predict all 
possible T cell epitopes from a given protein antigen, 
they may preferentially predict those epitopes capable of 
binding lo multiple MHC alleles. Thus a vaccine 
comprised of OptiMer or EpiMer peptides could be 
capable of stimulating an immune response in subjects 
with a variety of genetic backgrounds. 

It has previously been shown that reiterative MHC- 
binding motifs specific to a single allele, situated within 
the same peptide, can greatly enhance the binding of 
thai peptide to the associated MHC molecule''*. The 
success of both OptiMer and EpiMer may be due to the 
striking tendency of overlapping binding motifs for 
multiple MHC alleles to cluster within protein antigen 
sequences. Indeed, as can be seen in figure 2, relative 
densities as high as 1 1 motifs within a single 1 1 -residue 
peptide can be locaied, and densities of 6 or more motifs 
per ll-rcsidue peptide ore nol uncommon, in contrast, 
other regions of the same protein, as long as 35 residues, 
may contain very few MHC-binding motifs (Figure 2). 
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These observations imply that MHC-binding motifs are 
not randomly distributed over a protein sequence, but 
rather tend to occur in clusters that may have great 
practical value as predictors of promiscuous MHC- 
binding peptides. 

As described here, both OptiMer and EpiMer can be 
tailored to predict peptides that contain class I MHC* 
binding motifs, class II MHC-binding motifs, or motifs 
of both classes. For a pathogen that predominantly 
elicits class JI-mediated responses, such as Mtb, class II- 
restricled motifs can be used to search the amino acid 
sequences of the pathogens protein antigens, generating 
a list of peptides with a potentially broad range of 
activity in a variety of immunogenetic contexts. Both 
algorithms can also be implemented at various levels of 
stringency, allowing control over the density or MHC- 
binding motifs required to signal the location of a 
putative epitope, as well as over the number and mean 
length of putative epitopes predicted." When 
implemented at higher levels of stringency, OptiMer and 
EpiMer may be able to decrease the total cost of 
locating T cell epitopes wjthin protein antigens, as well 
as reduce the effort required to synthesize and test these 
putative epitopes, in comparison to ihe brute force 
method of constructing and testing overlapping 
peptides. 

As shown in Table J, the EpiMer algorithm, based 
solely on the density of MHC-binding mouf matches 
measured along a given protein sequence, has a 
measurably higher efficiency than either the OptiMer ot 
AMPHI algorithm for proteins which have been 
extensively mapped for T cell epitopes, namely, the 19 
and 65 KD antigens. This improved performance may 
indicate that, in fact, the EpiMer algorithm will prove to 
be the more useful of the two novel algorithms in the 
prediction of T cell epitopes. Work is underway to 
compare EpiMer -predicted putative epitopes to T cell 
epitopes published for a variety of pathogens. 

As we have measured the power of the OptiMer and 
EpiMer algorithms by comparing their predictions to 
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Flaure 3 Comparison ol publiahrt ephope. AMPHh OpUM*-. and Ep,Mer-pred.cied peptides, lor amino acids iTO-100 J* Jf* 
X anilaen s!^ce. MHC-binding monl matches are shown for an EplMer-predteted putative epitope. -MHC+ 
represented as ?S ac.d siart - am.no acd slop position within protein; mol.!-matching amino acid sequence; alleles represented m moi.l- 
rnaiching region, according lo Table 1 ) 
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Table 3 Efficiency, sensitivity, and sensitivity per amino acid values lor five individual Mtb protein antigens 
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12 
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43 
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44 
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40 
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Efficiency, sensitivity, and sensitiviry/AA have been calculated as described In Methods 
-Sensltlvlty/AA has been multiplied by 10 s tor clarity 



published T cell epitopes, our evaluation may have been 
restricted by the fraction of actual epitopes thai have 
been discovered and published to date Predicted 
epitopes which do not currently appear in the literature 
may sjmply have not yet been assayed experimentally; 
for our analysis, such putative epitopes would have been 
wrongly scored as "false positive" predictions. In 
addition, as both AMPHI and OptiMer employ the 
search for amphipathic peptides, and AMPHI is often 
used by research groups to choose putative epitopes to 
test, we have excluded from our study any published T 
eel! epitope discovered initially through the use of the 
AMPHI algorithm. These exclusions may have biased 
our evaluation against the prediction of putative 
epitopes by the algorithms described. 

The successful prediction of putative T cell epitopes 
using either the OptiMer or EpiMer algorithm is 
dependent upon the accuracy of the MHC-binding 
motifs used to search the sequences in question. Not all 
predicted peptides can be expected either to bind to 
MHC molecules wiih high affinity, or to stimulate 
immune responses both in vitro and in vivo Published 
motifs have been shown, in some cases, to be inaccurate 
predictors of either peptide-MHC-btnding. 
irnmunodomi nance, or both; only about one-third of 
peptides containing the motif corresponding to u given 
class I MHC allele have been found to be presented by 
that MHC molecule 20 " 40 st \ Certain motifs have been 
refined over time, using new information derived from 
amino acid substitution and peptide truncation 
experiments" 147 . While extensive experimental data 
confirming the accuracy of MHC-binding motifs both in 
vino and in vivo, as well as data linking predicted 
peptide epitopes to protective immunity, are still 
lacking, the utility of epitope prediction for the 
identification of epitopes that may stimulate a protective 
response was recently demonstrated for u well- 
characterized antigen of Plasmodium falciparum". 

Peptides including amino acid residues that inhibit or 
interfere with MHC binding have recently been 
described' 3,1 *-*'. OptiMer and EpiMer have been 
designed lo accommodate future changes in the 



database of known MHC-binding motifs. As individual 
motifs are refined and shown to correlate with MHC- 
bmding, iramunogenicity, and the induction of 
protective immune responses, the usefulness of multiple 
motif-based epitope prediction methods such as 
OptiMer and EpiMer should dramatically increase. 
Note that tbc limited sensitivity of either algorithm 
could reflect the small number of MHC molecules for 
which binding motifs are known. As this number 
increases, the sensitivity of both algorithms may 
improve. 

The only true test of the predictive power of the 
OptiMer and EpiMer algorithms will be in the synthesis 
and in vitro testing of predicted epitopes. Putative T cell 
epitopes predicted by either of these novel algorithms 
must be tested in model systems for both 
immunogenicity and positive correlation with 
immunoprotective responses. Synthesis and testing of 
the putative Mtb epitope* described herein arc 
underway. 

The OptiMer algorithm combines searches for 
secondary structural characteristics and MHC-binding 
motifs, and predicts amphipathic, "promiscuous" 
putative T cell epitopes that contain motifs for multiple 
MHC alleles. Notably, the EpiMer algorithm, solely 
based on the clustering of MHC-binding motifs within 
protein primary structures, also predicts putative 
"promiscuous" epitopes with a nearly equivalent level of 
predictive success. Both OptiMer and EpiMer can be 
tailored to search for any combination of MHC-binding 
motifs; the algorithms can also be modified to include 
new motifs as they are published. If OptiMer- or 
EpiMcr-prcdictcd peptides prove to function as 
immunodominant T cell epitopes in in vitro assays, as 
their positive correlation with published epitopes 
implies, these peptides may be candidates for inclusion 
in synthetic peptidc-based vaccines. OptiMer and 
EpiMer may therefore provide sensitive and efficient 
means for the prediction of T cell epitopes crucial to the 
development of vaccines against Mtb, HIV, and a 
number of other pathogens. 
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NOTE ADDED IN PROOF 

EpiMer predictions have been performed for a total of 
20 different protein antigens; comparison with 
published epitopes revealed a 2.37-fold (range 
0.86-3.29) improvement in sensitivity per amino acid 
over the overlapping method (G. Meister et at., 
unpublished results). 

Twenty-eight EpiMer -predicted Mtb peptides have 
been tested using PBMC from Mtb-jmmune donors. 16 
of the 28 peptides were immunogenic in vitro (B. 
Edelson ei al % unpublished results). 
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